DOLBY.TO[LSP,JRA] - www.SailDart.org

perm filename DOLBY.TO[LSP,JRA] blob sn#183885 filedate 1975-10-31 generic text, type C, neo UTF8

COMMENT ⊗ VALID 00004 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 The question has come up again this year about the content of Math
C00008 00003 Sept 4
C00010 00004 .CENT(DATA STRUCTURES)
C00028 ENDMK
C⊗;
The question has come up again this year about the content of Math
280. There are at least two responses to the complaint that the
course as given is not compatible with that outlined in the CIS
prospectus. The first answer is that this year's course will indeed
cover the material listed except for a detailed examination of
sorting and searching techniques and a study of generalized business
data management systems. Most of the material was covered last
semester but certainly not in the order suggested.

The second response is more to the point: the course outline is not
indicative of a modern course in data structures. At best it reflects
the thinking of a certain group established in 1967-1968. Other areas
of computer science have changed significantly in the intervening
years and data structures courses are no exceptions. So the first
objection is that the course outline as given --a copy of that
advertised in Curriculum68-- is out of date. Second, the outline
itself even gives a distorted view of the material suggested from
reading the accompanying article in the CACM. A reading of the
committee's report giving a more detailed discussion of the purpose
and content of such a course is more reasonable. I seriously doubt
the "orthogonal lists" are a crucial part of a course on data
structures. Indeed I suspect that the only reason for their
occurrence in the abstract is that they are listed as a topic in
Knuth's Chapter 2, Volume 1. (There is more than a passing similarity
between the data structures abstract and the contents of that
Volume.)

A more fundamental argument concerns background of the contemporary
student in computer science. In the earlier days books and courses
which dealt mainly with tricks and techniques were apropos. The
students were typically semi-professionals working in the field and
needed to upgrade their skills to handle clever coding tricks for
representing data, just as a premium was then placed on being able to
write clever self-modifying code. But the field and its students have
changed. Many students are coming from a purely scholastic
environment with no backgrond which would tend to motivate the study
of techniques common in structuring data. Thus an integral part of a
introductory course in data structures must provide motivation. This
is even more true in a program like that at San Jose where the
students come from such a varied background. The course must supply
motivation while requiring no prerequisite but intelligence. If we
cannot presuppose that as a prerequisite then we are in trouble.

The field has changed as well as the student. We are now beginning to
realize that trickery and techniques are not the appropriate way to
program computers. This realization is expressed in the ideas of
structurred programming, programming with abstraction, modularity... .
The basic idea being expressed is "clarity". Programs must be
clear and readible, easily modifiable and hopefully amenable to
informal justification if not formal proof. Machine dependent
encoding of data structures and their associated algorithms are most
definitely not in keeping with this spirit of modern programming.

Finally I would say that computer science is a professional field in
the same sense as law, business or medicine. It is not an area
to be treated by survey courses for "tourists" any more than we
have survey courses in graduate mathematics. Until we stop treating
the area as if it is a service bureau for applied mathematics the
program is doomed to fail.
Sept 4
two points:
it's a math course
it's graduate level
therefore reasonable amount of work is expected

disussion of CACM68
the field has changed since then.
the typical textbook approach offers no motivation
it deals with tricks and techniques
therefore look for motivation in the origins of those techniques
the ideas come form programming of complex software: languages and systems
therefore we can kill two birds with one stone:use a high level language(notation)
for discussing data structures, independent of their implementation; and
then study the implementation in terms of the techniques necessary to implement
that language.

why high level
.CENT(DATA STRUCTURES)
The objectives of this course are to give the beginning student
in computer science a perspective on the field of data structures.

In specific, the students enrolled in this course will be taught
the following:

.begin indent 0,7;
The importance of abstraction in programming methodology. This will
be done by using a high-level notation for describing algorithms for
abstract data strucutres. The obvious analogy is to note that we don't
require students of numerical analysis to know machine language; we
should not need to discuss data structures at that level either.

We use a formalism based on LISP's meta-expressions since the necessary
ingredients are there. We discuss symbolic expressions as a primitive
data structure just as 0 is the basis for many mathematical investigations.
We examine representations for such primitives; in particular graphical
representations are convenient and will map in an obvious way when we
discuss implementation. We develop primitive operations on these abstract
S-expressions (just as successor is developed for number theory); and finally
we introduce control structures on conditional expressions, call-by-value,
and recursion. It is these three components --data structures, operations, and
control structures, which form the basis for all discussions of algorithms.

Next we apply the ideas generated in the study og synbolic expressions to
study an independent domain: the domain of sequences. The same level of
abstraction is used and we begin to see some of the inherent ingredients
necessary to discuss data structures: -we must be able to %3recognize%*
instances of data structures; we must be able to %3construct%* data
structures from components; we must be able to %3select%* components
from non-elementary structures. Again we develop alternative representations
for sequences; again the graphical representations will have natural
maps to the typical machine architecture⊗↓Indeed, tho the student doesn't
know it yet, the underlying representations of all%2all%* context-free
data structures have been shown at this point: sequences, structures, and
pointers. They %2will%* see this when we get to efficiency.←

At this point we begin to move from formal study to implementation.
We wxamine the problem of representation. How do we represent a new
domain in terms of an existing domain? We have to show a mapping from
data structures, control structures, and operations to data structures,
control structures and operations. In particular we develop a map
from sequences to symbolic expressions. Some additional problems appear
which give insight into questions of user-defined data structures in
programming languages: i.e. how do you implement them.

We then begin an applications section, showing that the tools we have build
including the ideas of abstraction, will allow us to encode several non-trivial
applications areas: in particular symbolic mathematics is covered⊗↓I plan to
have another session later on business applications when I can find a reasonable
example. The field of data base management is ripe for this.←.
The %2idea%* which comes out of this is the problem of representation again. We
must be able to represent the elements of our problem domain as data structures
and represent our intuitive algorithms as data structure manipulating function.

The next extension to symbolic mathematics is evaluation: given a rperesentation
of a polynomial, say, how do we encode what is mean by the %3value%* of that
expression for given values of the variables? This brings to the surface
a current which has been running through all the discussion: how do we
evaluate expressions in our high-level notation. We have previously
duscussed termination, partial functions, call-by-value, and call-by-name.
The obvious why to lay these problems to rest is to encode the
evaluation process as
a data structure manipulating algorithm. Then any questions about what is meant
by the evaluation of expressions in our language is decided by going to
our formal specification. This is obviously important for computer science:
the business of programming languages is evalauting expressions, and unless
we have an unambiguous specification for our language we cannot hope to
implement it!

The next chapter discusses in detail the problems of evaluation for LISP-like
languages. Again we extract out an abstract algorithm for the evalaution process;
this will give us a representation independent (read portable software)
description of our evalaution process. It will also bring a new class of
abstract data structures to the surface: environments or symbol tables.
The difficulties of manipulating symbol tables are not those of efficiency
the problems are conceptual. The recent discussion of "retention or deletion"
are most clearly seen in a context like this. The problems are those
of procedure-valued variables; and it was LISP which first gave correct
solutions in 1960, before the rest of the world understood the problem.

Next we go down a level of abstraction. We begin to show how such a language
is inplementable on current hardare. This has several benefits. First
the typical techniques and tricks which appear in most data structures course
will appear as necessary requirements for implenenting LISP. This is not
by accident; many of them first appeared in LISP's implementation.
Second it gives the student a breather and shows a perhaps more concrete
implementation than that described in the abstract description of evalaution.
We discuss linked allocation, double linking and its obvious extensions. These
techniques come up most naturally here: how do we implement binary tree structure
on a linear memory so that the most general manipulations are possible.
We begin a high level discussion of machine architecture: what is necessary
for a LISP machine. In the process the questions of efficient implementation
come up. We see that many of the LISP operations involve symbol table
manipulation and though our mechanism for environments is logically
sufficient, it is not particularly efficient. It uses linear search, which is
no defined even though the student has been %2using%* it all along.
We study the problems of efficient symbol table manipulation. This gives
rise to hashing functions and their implementation.
Within this discussion we see LISP's scanner and parser; all but a small portion
of these algorithms are best described in LISP itself.
We discuss the implementation of LISP's primitive functions and see that
the constructor for LISP requires a sophisticated memory management
scheme. This discussion leads to garbage collection (first used by LISP,
but now a systems programming trick) and reference counter. A simple mark-sweep
algorithm is given in LISP, though we will return to compacting garbage collection
in the late section on efficiency.

We give a completely
revised version of the symbol table schemes, but show that the obvious
implemenation won't work. The analysis of the difficulties shows the
inadequacy of stack architecture for the implementation of sophisticated
software. But we already know that for the high level discussion of
evaluation. The whole question of abstract control structures as been
gone into when we discussed environments as abstract data structures.

In the discussion of symbol tables, hashing, and dictionaries we see that we
really have developed a much more general mechanism that just a simple
table. We can store multiple properties. Exploiting that power
leads to a discussion of alternative calling sequences for procedures.
CAll-by-value is nice but... We discuss call-by-name, and macro definitions;
we discuss the problems in their implementation. Finally we discuss
alternative symbol table organization, leading to an understanding
of the "shallow-vs-deep" binding controversy and the recent work
on "spaghetti" stacks; again all this is old hat to LISP.

Now that we have an understanding of the static structure of a LISP-like
machine, what do we need to make it run. This is the topic of the
next chapter: the dynamicstructure of LISP. That is, what kind of
code should we expect to write to encode the LISP evalaution process on
contemporary hardware? This is just another problem in representation.
What we do is write a LISP function which takes represnetations of
LISP functions as input and outputs a representation of the code for our
machine. This is called a compiler. But before that can be done
we must define our low-level machine. The one we use is a subset of the
PDP-10. We then describe our complier as a high level algorithm. We then
note that the compiler can itself be compiled. This discussion covers
the variaous forms of bootstrapping and its impliations. Again LISP's
compiler was the first instance of bootstrapping compilers.

In the context of compilation we introduce one-pass assemblers. This is
done for two reasons: first we need a simple assembler and one pass assemblers
are interesting; second, the techniques for fixups and forward references
are an application of linked-lists.
The problems in writing compliers are most apparent in handling of variable references.
In the compilers we write we can see that a stack %2could%* be used for local
variable references but to handle global lookup a more careful scheme is needed
the whol question of symbol tables comes up again.

The final part of the chapter on dynamics discusses the problems of running programs.
The techniques being developed by LISP programmers in the field of interactive programming
are the best in the ares. We discuss the implementations of interactive programming
tracing, demons, and the whole problem of programming environments.

The last chapter goes down to the most basic level of data structures. It deals
with efficiency and alternative representations in storage.
We cover bit-tables. Typical implementations for vectors and arrays, dope vectors,
mother vectors and the like; we cover string processors and their implementation,
compacting garbage collectors and storage management.

Next we examine LISP's list-modifying instructions. This brings up pointer
manipulation and the problems of shared data structure. We give
applications of pointer modification involving link-bending garbage collection;
analysis of the method shows that what is being done is hiding the stack
necessary for the recursive collector, in the links themselves. This
is exactly what threading of list is doing.

The final material goes back over the whole question of data structures and
represnetions and attempts to gain insight on what has happened in the course.

The course attempts to raise as many questions as possible. Hopefully some of the
questions have been answered in the course. But not all.